55 research outputs found
A Study of Deep Learning Robustness Against Computation Failures
For many types of integrated circuits, accepting larger failure rates in
computations can be used to improve energy efficiency. We study the performance
of faulty implementations of certain deep neural networks based on pessimistic
and optimistic models of the effect of hardware faults. After identifying the
impact of hyperparameters such as the number of layers on robustness, we study
the ability of the network to compensate for computational failures through an
increase of the network size. We show that some networks can achieve equivalent
performance under faulty implementations, and quantify the required increase in
computational complexity
Modeling and Energy Optimization of LDPC Decoder Circuits with Timing Violations
This paper proposes a "quasi-synchronous" design approach for signal
processing circuits, in which timing violations are permitted, but without the
need for a hardware compensation mechanism. The case of a low-density
parity-check (LDPC) decoder is studied, and a method for accurately modeling
the effect of timing violations at a high level of abstraction is presented.
The error-correction performance of code ensembles is then evaluated using
density evolution while taking into account the effect of timing faults.
Following this, several quasi-synchronous LDPC decoder circuits based on the
offset min-sum algorithm are optimized, providing a 23%-40% reduction in energy
consumption or energy-delay product, while achieving the same performance and
occupying the same area as conventional synchronous circuits.Comment: To appear in IEEE Transactions on Communication
Layerwise Noise Maximisation to Train Low-Energy Deep Neural Networks
Deep neural networks (DNNs) depend on the storage of a large number of
parameters, which consumes an important portion of the energy used during
inference. This paper considers the case where the energy usage of memory
elements can be reduced at the cost of reduced reliability. A training
algorithm is proposed to optimize the reliability of the storage separately for
each layer of the network, while incurring a negligible complexity overhead
compared to a conventional stochastic gradient descent training. For an
exponential energy-reliability model, the proposed training approach can
decrease the memory energy consumption of a DNN with binary parameters by
3.3 at isoaccuracy, compared to a reliable implementation.Comment: To be presented at AICAS 202
Sharpness-Aware Training for Accurate Inference on Noisy DNN Accelerators
Energy-efficient deep neural network (DNN) accelerators are prone to
non-idealities that degrade DNN performance at inference time. To mitigate such
degradation, existing methods typically add perturbations to the DNN weights
during training to simulate inference on noisy hardware. However, this often
requires knowledge about the target hardware and leads to a trade-off between
DNN performance and robustness, decreasing the former to increase the latter.
In this work, we show that applying sharpness-aware training by optimizing for
both the loss value and the loss sharpness significantly improves robustness to
noisy hardware at inference time while also increasing DNN performance. We
further motivate our results by showing a high correlation between loss
sharpness and model robustness. We show superior performance compared to
injecting noise during training and aggressive weight clipping on multiple
architectures, optimizers, datasets, and training regimes without relying on
any assumptions about the target hardware. This is observed on a generic noise
model as well as on accurate noise simulations from real hardware.Comment: Preprin
VLSI Implementation of Deep Neural Network Using Integral Stochastic Computing
The hardware implementation of deep neural networks (DNNs) has recently
received tremendous attention: many applications in fact require high-speed
operations that suit a hardware implementation. However, numerous elements and
complex interconnections are usually required, leading to a large area
occupation and copious power consumption. Stochastic computing has shown
promising results for low-power area-efficient hardware implementations, even
though existing stochastic algorithms require long streams that cause long
latencies. In this paper, we propose an integer form of stochastic computation
and introduce some elementary circuits. We then propose an efficient
implementation of a DNN based on integral stochastic computing. The proposed
architecture has been implemented on a Virtex7 FPGA, resulting in 45% and 62%
average reductions in area and latency compared to the best reported
architecture in literature. We also synthesize the circuits in a 65 nm CMOS
technology and we show that the proposed integral stochastic architecture
results in up to 21% reduction in energy consumption compared to the binary
radix implementation at the same misclassification rate. Due to fault-tolerant
nature of stochastic architectures, we also consider a quasi-synchronous
implementation which yields 33% reduction in energy consumption w.r.t. the
binary radix implementation without any compromise on performance.Comment: 11 pages, 12 figure
Learning Energy-Efficient Hardware Configurations for Massive MIMO Beamforming
Hybrid beamforming (HBF) and antenna selection are promising techniques for
improving the energy efficiency~(EE) of massive multiple-input
multiple-output~(mMIMO) systems. However, the transmitter architecture may
contain several parameters that need to be optimized, such as the power
allocated to the antennas and the connections between the antennas and the
radio frequency chains. Therefore, finding the optimal transmitter architecture
requires solving a non-convex mixed integer problem in a large search space. In
this paper, we consider the problem of maximizing the EE of fully digital
precoder~(FDP) and hybrid beamforming~(HBF) transmitters. First, we propose an
energy model for different beamforming structures. Then, based on the proposed
energy model, we develop an unsupervised deep learning method to maximize the
EE by designing the transmitter configuration for FDP and HBF. The proposed
deep neural networks can provide different trade-offs between spectral
efficiency and energy consumption while adapting to different numbers of active
users. Finally, to ensure that the proposed method can be implemented in
practice, we investigate the ability of the model to be trained exclusively
using imperfect channel state information~(CSI), both for the input to the deep
learning model and for the calculation of the loss function. Simulation results
show that the proposed solutions can outperform conventional methods in terms
of EE while being trained with imperfect CSI. Furthermore, we show that the
proposed solutions are less complex and more robust to noise than conventional
methods.Comment: This preprint comprises 15 pages and features 15 figures. Copyright
may be transferred without notic
Relaxed Half-Stochastic Belief Propagation
Low-density parity-check codes are attractive for high throughput
applications because of their low decoding complexity per bit, but also because
all the codeword bits can be decoded in parallel. However, achieving this in a
circuit implementation is complicated by the number of wires required to
exchange messages between processing nodes. Decoding algorithms that exchange
binary messages are interesting for fully-parallel implementations because they
can reduce the number and the length of the wires, and increase logic density.
This paper introduces the Relaxed Half-Stochastic (RHS) decoding algorithm, a
binary message belief propagation (BP) algorithm that achieves a coding gain
comparable to the best known BP algorithms that use real-valued messages. We
derive the RHS algorithm by starting from the well-known Sum-Product algorithm,
and then derive a low-complexity version suitable for circuit implementation.
We present extensive simulation results on two standardized codes having
different rates and constructions, including low bit error rate results. These
simulations show that RHS can be an advantageous replacement for the existing
state-of-the-art decoding algorithms when targeting fully-parallel
implementations
RSSI-Based Hybrid Beamforming Design with Deep Learning
Hybrid beamforming is a promising technology for 5G millimetre-wave
communications. However, its implementation is challenging in practical
multiple-input multiple-output (MIMO) systems because non-convex optimization
problems have to be solved, introducing additional latency and energy
consumption. In addition, the channel-state information (CSI) must be either
estimated from pilot signals or fed back through dedicated channels,
introducing a large signaling overhead. In this paper, a hybrid precoder is
designed based only on received signal strength indicator (RSSI) feedback from
each user. A deep learning method is proposed to perform the associated
optimization with reasonable complexity. Results demonstrate that the obtained
sum-rates are very close to the ones obtained with full-CSI optimal but complex
solutions. Finally, the proposed solution allows to greatly increase the
spectral efficiency of the system when compared to existing techniques, as
minimal CSI feedback is required.Comment: Published in IEEE-ICC202
A relaxed half-stochastic decoding algorithm for LDPC codes
When considering error-correction codes for applications, the most important aspect of a coding scheme becomes the ratio of error-correction performance versus cost. This work studies the decoding of LDPC codes, and presents a new iterative decoding algorithm that represents likelihood as binary stochastic streams, but uses some elements of the sum-product algorithm in its variable node. To convert the stochastic streams to a log-likelihood ratio representation, the algorithm uses the principle of successive relaxation. Because likelihood is represented as stochastic streams, processing nodes only exchange 1 bit messages, which results in a low-complexity interleaver. Simulations show that the proposed algorithm achieves excellent error-correction performance and can outperform the floating-point sum-product algorithm.We also study the problem of error floors in LDPC codes, and the manner in which it can be addressed at the level of the decoder. After reviewing existing solutions, an alternative technique for lowering error floors is presented. The technique, called redecoding, relies on the randomized progress of a decoding algorithm to successfully decode problematic frames. Simulation of one code shows that redecoding removes the floor at least down to a bit error rate of 10^-12.Lorsque des codes de correction d'erreur sont utilisĂ©s dans des applications, le ratio entre la performance de correction d'erreur et le coĂ»t devient l'aspect le plus important du systĂšme. La prĂ©sente thĂšse se penche sur le dĂ©codage des codes LDPC, et prĂ©sente un nouvel algorithme itĂ©ratif de dĂ©codage qui utilise des chaĂźnes stochastiques binaires pour reprĂ©senter les probabilitĂ©s tout en incorporant des Ă©lĂ©ments de l'algorithme somme-produit dans les noeuds de variables. Le principe de relaxation sĂ©quentielle est utilisĂ© pour convertir les chaĂźnes stochastiques `a des valeurs discrĂštes "LLR". Du fait de la reprĂ©sentation des probabilitĂ©s sous forme de chaĂźnes stochastiques, les messages Ă©changĂ©s entre les noeuds ont un seul bit, ce qui permet de diminuer la complexitĂ© de la distribution des messages. Les simulations dĂ©montrent que l'algorithme proposĂ© atteint une excellente performance de correction d'erreur, et qu'il peut surpasser l'algorithme somme-produit en virgule flottante. La thĂšse Ă©tudie Ă©galement le problĂšme des planchers d'erreur reliĂ© aux codes LDPC et se penche sur des solutions potentielles au niveau de l'algorithme de dĂ©codage. AprĂšs avoir passĂ© en revue certaines solutions existantes, une mĂ©thode alternative qui permet d'abaisser le plancher d'erreur est prĂ©sentĂ©e. La mĂ©thode, baptisĂ©e redĂ©codage, tire avantage de la progression alĂ©atoire de l'algorithme de dĂ©codage. Les simulations effectuĂ©es sur un code dĂ©montrent que le redĂ©codage Ă©limine le plancher d'erreur au moins jusqu'Ă un taux d'erreur par bit de 10^â12
- âŠ